Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach.
نویسندگان
چکیده
BACKGROUND Previous studies identifying patients with inflammatory bowel disease using administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record-based model for classification of inflammatory bowel disease leveraging the combination of codified data and information from clinical text notes using natural language processing. METHODS Using the electronic medical records of 2 large academic centers, we created data marts for Crohn's disease (CD) and ulcerative colitis (UC) comprising patients with ≥1 International Classification of Diseases, 9th edition, code for each disease. We used codified (i.e., International Classification of Diseases, 9th edition codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables. RESULTS We confirmed 399 CD cases (67%) in the CD training set and 378 UC cases (63%) in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve for CD 0.95; UC 0.94) than models using only disease International Classification of Diseases, 9th edition codes (area under the curve 0.89 for CD; 0.86 for UC). Addition of natural language processing narrative terms to our final model resulted in classification of 6% to 12% more subjects with the same accuracy. CONCLUSIONS Inclusion of narrative concepts identified using natural language processing improves the accuracy of electronic medical records case definition for CD and UC while simultaneously identifying more subjects compared with models using codified data alone.
منابع مشابه
The Study of Upper Gastrointestinal Endoscopy in Patients with Inflammatory Bowel Disease and Ulcerative Colitis
Background and aims: In diagnosing inflammatory bowel disease, one of diagnostic way is upper gastrointestinal endoscopy, which helps in differential diagnosis of unspecified colitis as well. The aim of this study was to investigate the necessity of upper gastrointestinal endoscopy in patients with inflammatory bowel disease. Materials and Methods: In this descriptive cross-sectional...
متن کاملImproving Case Definition of Crohns Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing
Introduction—Prior studies identifying patients with inflammatory bowel disease (IBD) utilizing administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record (EMR) based model for classification of IBD leveraging the combination of codified data and information from clinical text notes using natural language processing (NLP). Methods—Usi...
متن کاملPyostomatitis Vegetant: An Important Diagnostic in Oral Diseases (Two Case Reports)
Background and Objectives: Inflammatory bowel disease (IBD) is a term that refers to crohn's disease (CD) and ulcerative colitis (UC). Oral manifestations in this disease category precedes the onset of gastrointestinal symptoms. In many patients, intestinal symptoms may be minimal or remain undiagnosed. In this paper, two cases of Pyostomatitis vegetans have been investigated. Case Report: T...
متن کاملSerum Interleukin-23 Levels in Patients with Ulcerative Colitis
Background: Patients with ulcerative colitis are at increased risk of inflammation. Interleukin 23 (IL-23) is a newly identified cytokine with increased expression in inflamed biopsies of colon mucosa in patients with Crohn's disease; however, there is inconsistent evidence on its role in ulcerative colitis. Objective: We aimed to compare serum IL-23 level in patients with ulcerative colitis an...
متن کاملUtility And Metrics Of Natural Language Processing On Identifying Patients For Pharmacoepidemiologic Studies.
Objective Electronic medical records (EMR) are increasingly utilized in clinical practice and research, allowing for more efficient availability of rich patient records. However, most use of EMR is limited to coded, structured, administrative data, while the vast majority of patient information (e.g. disease subtype, severity, medical device usage, etc.) is tied up in narrative clinical notes. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inflammatory bowel diseases
دوره 19 7 شماره
صفحات -
تاریخ انتشار 2013